Large Vocabulary Audio-Visual Speech Recognition Using Active Shape Models

نویسندگان

  • Tanveer A. Faruquie
  • Abhik Majumdar
  • Nitendra Rajput
  • L. Venkata Subramaniam
چکیده

Orthogonal information present in the video signal associated with the audio helps in improving the accuracy of a speech recognition system. Audio-visual speech recognition involves extraction of both the audio as well as visual features from the input signal. Extraction of visual parameters is done by the recognition of speech dependent features from the video sequence. This paper uses geometrical features to describe the lip shapes. Curve-based Active Shape Models are used to extract the geometry. These geometrically represented visual parameters are used along with the audio cepstral features to perform an audio-visual classification. It is shown that the bimodal system presented here gives an improvement in the classification results over classification using only the audio features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: Visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine t...

متن کامل

Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit

This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experi...

متن کامل

Asynchronous stream modeling for large vocabulary audio-visual speech recognition

This paper addresses the problem of audio-visual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information integration. We show how these models can be tr...

متن کامل

Using Likelihood L-statistic as Confidence Measure in Audio-visual Speech Recognition

This paper describes recent work on decision fusion in audio-visual speech recognition. In this work, a novel approach is proposed to combine audio and video channels information in audio-visual speech recognition scenario. For simplicity, we have only considered frame-level phonetic classification problem using two singlestream Gaussian Mixture Model (GMM). Audio and video streams are adaptive...

متن کامل

Improving lip-reading performance for robust audiovisual speech recognition using DNNs

This paper presents preliminary experiments using the Kaldi toolkit [1] to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000